我们建议基于负担能力识别和一种神经远期模型的组合来预测负担执行的效果的新型动作序列计划。通过对预测期货进行负担能力识别,我们避免依赖多步计划的明确负担效果定义。由于该系统从经验数据中学习负担能力效果,因此该系统不仅可以预见到负担的规范效应,还可以预见到特定情况的副作用。这使系统能够避免由于这种非规范效应而避免计划故障,并可以利用非规范效应来实现给定目标。我们在一组需要考虑规范和非典型负担效应的测试任务上评估了模拟系统的系统。
translated by 谷歌翻译
Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularize leader (FTRL) algorithms for this setting: Balanced-FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive-FTRL which needs $\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ plays without this requirement by progressively adapting the regularization to the observations.
translated by 谷歌翻译
Federated learning is a collaborative model training method by iterating model updates at multiple clients and aggregation of the updates at a central server. Device and statistical heterogeneity of the participating clients cause performance degradation so that an appropriate weight should be assigned per client in the server's aggregation phase. This paper employs deep unfolding to learn the weights that adapt to the heterogeneity, which gives the model with high accuracy on uniform test data. The results of numerical experiments indicate the high performance of the proposed method and the interpretable behavior of the learned weights.
translated by 谷歌翻译
We consider task allocation for multi-object transport using a multi-robot system, in which each robot selects one object among multiple objects with different and unknown weights. The existing centralized methods assume the number of robots and tasks to be fixed, which is inapplicable to scenarios that differ from the learning environment. Meanwhile, the existing distributed methods limit the minimum number of robots and tasks to a constant value, making them applicable to various numbers of robots and tasks. However, they cannot transport an object whose weight exceeds the load capacity of robots observing the object. To make it applicable to various numbers of robots and objects with different and unknown weights, we propose a framework using multi-agent reinforcement learning for task allocation. First, we introduce a structured policy model consisting of 1) predesigned dynamic task priorities with global communication and 2) a neural network-based distributed policy model that determines the timing for coordination. The distributed policy builds consensus on the high-priority object under local observations and selects cooperative or independent actions. Then, the policy is optimized by multi-agent reinforcement learning through trial and error. This structured policy of local learning and global communication makes our framework applicable to various numbers of robots and objects with different and unknown weights, as demonstrated by numerical simulations.
translated by 谷歌翻译
特定的发射极识别(SEI)是物理层身份验证的高潜在技术,它是上层身份验证的最关键补充之一。 SEI基于电路差而不是密码学的射频(RF)特征。这些功能是硬件电路的固有特征,很难伪造。最近,已经提出了各种基于深度学习(DL)的常规SEI方法,并实现了高级性能。但是,提出了这些方法,用于使用大量的RF信号样品进行训练的近距离场景,并且在训练样品有限的情况下,它们的性能较差。因此,我们将重点放在几个射击SEI(FS-SEI)上,用于通过自动依赖的监视播(ADS-B)信号进行飞机识别,并根据深度度量集合学习(DMEL)提出了一种新颖的FS-SEI方法。具体而言,提出的方法包括特征嵌入和分类。前者基于具有复杂价值的卷积神经网络(CVCNN)的度量学习,用于提取具有紧凑的类别内距离和可分离类别间距离的区分特征,而后者则由集合分类器实现。仿真结果表明,如果每个类别的样本数量超过5,则我们提出的方法的平均准确性高于98 \%。此外,特征可视化证明了我们提出的方法在可区分性和概括方面的优势。本文的代码可以从GitHub(https://github.com/beechburgpiestar/few-shot-specific-emitter-emitter-istifification-via-deep-metric-metric-semble-learning)下载。
translated by 谷歌翻译
贝叶斯后期和模型证据的计算通常需要数值整合。贝叶斯正交(BQ)是一种基于替代模型的数值整合方法,能够具有出色的样品效率,但其缺乏并行化阻碍了其实际应用。在这项工作中,我们提出了一种并行的(批次)BQ方法,该方法采用了核正素的技术,该技术具有证明是指数的收敛速率。另外,与嵌套采样一样,我们的方法允许同时推断后期和模型证据。重新选择了来自BQ替代模型的样品,通过内核重组算法获得一组稀疏的样品,需要可忽略的额外时间来增加批处理大小。从经验上讲,我们发现我们的方法显着优于在包括锂离子电池分析在内的各种现实世界数据集中,最先进的BQ技术和嵌套采样的采样效率。
translated by 谷歌翻译
在边缘计算中,抑制数据大小是执行复杂任务(例如自动驾驶)的机器学习模型的挑战,其中计算资源(速度,内存大小和功率)受到限制。通过将其分解为整数和真实矩阵的乘积,已经引入了矩阵数据的有效损耗压缩。但是,它的优化很困难,因为它需要同时优化整数和真实变量。在本文中,我们通过利用最近开发的黑盒优化(BBO)算法来改善这种优化,并具有用于整数变量的ISING求解器。此外,该算法可用于解决分别在真实和整数变量方面线性和非线性的混合成员编程问题。讨论了ISINS求解器的选择(模拟退火,量子退火和模拟淬火)与BBO算法(BOCS,FMQA及其变化)的策略之间的差异,以进一步开发BBO技术。
translated by 谷歌翻译
二次无约束的二进制优化(QUBO)求解器可以应用于设计最佳结构以避免共振。在经典或量子设备上使用的QUBO算法在某些工业应用中取得了成功。但是,由于难以从原始优化问题转变为QUBO,它们的应用仍受到限制。最近,已经提出了黑盒优化(BBO)方法,可以使用机器学习技术和贝叶斯治疗来解决此问题,以进行组合优化。我们采用了BBO方法来设计印刷电路板以避免共振。该设计问题是为了最大程度地提高固有频率并同时最大程度地减少安装点的数量。固有频率是QUBO公式的瓶颈,在BBO方法中近似于二次模型。我们证明,使用分解机的BBO在计算时间和找到最佳解决方案的成功概率中都表现出良好的性能。我们的结果可以打开Qubo求解器在结构设计中的其他应用的潜力。
translated by 谷歌翻译
模型 - 不可知的元增强学习需要估算价值函数的黑森斯矩阵。这是从实施角度挑战,反复区分政策梯度估计可能导致偏见的Hessian估计。在这项工作中,我们提供了一个统一的框架,用于估算价值函数的高阶导数,基于禁止策略评估。我们的框架将许多现有方法解释为特殊情况,并阐明了Hessian估计的偏差和方差权衡。该框架还打开了一个新的估计系列的大门,这可以通过自动差异化库轻松实现,并在实践中导致性能提升。
translated by 谷歌翻译